fix(tgi): honor S3 model_path as weight source for TGI builds (#5943)#5964
Open
sagarneeldubey wants to merge 1 commit into
Open
fix(tgi): honor S3 model_path as weight source for TGI builds (#5943)#5964sagarneeldubey wants to merge 1 commit into
sagarneeldubey wants to merge 1 commit into
Conversation
ModelBuilder with ModelServer.TGI silently ignored an S3 weight source (model_path="s3://..." or s3_model_data_url="s3://..."): it created a literal local "s3:/..." directory and set HF_MODEL_ID to the HF repo id, so the container always downloaded weights from huggingface.co. _build_for_tgi now detects an S3 weight source, skips the local mkdir for it, attaches the S3 prefix as an uncompressed ModelDataSource, and sets HF_MODEL_ID=/opt/ml/model and HF_HUB_OFFLINE=1 (via setdefault, preserving any user-supplied HF_MODEL_ID). Genuine local paths, HF-Hub downloads, JumpStart, and all non-TGI servers are unchanged. Also fixes two defects surfaced during real deployment: - HF_HUB_OFFLINE was reset to "0" at the end of the build; it now stays "1" for the S3-mounted path so TGI loads from /opt/ml/model. - The uncompressed ModelDataSource S3Uri could end in "//" when the input prefix already had a trailing slash; normalized to exactly one slash so S3Prefix matching finds the weight objects. Adds unit and regression tests for all of the above.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Issue
Fixes #5943
Summary
ModelBuilderwithModelServer.TGIsilently ignored an S3 weight source supplied viamodel_path="s3://..."ors3_model_data_url="s3://...". It created a literal locals3:/...directory and setHF_MODEL_IDto the HF repo id, so the deployed container always downloaded weights from huggingface.co. This forces a ~10-12 min cold start on every scale-out, makes scale-to-zero async endpoints impractical, creates a hard dependency on huggingface.co, and blocks deploying custom fine-tuned weights not published on the Hub._build_for_tginow:ModelDataSource(S3DataType=S3Prefix,CompressionType=None) by routing through_prepare_for_mode(model_path=...).HF_MODEL_ID=/opt/ml/modelandHF_HUB_OFFLINE=1viasetdefault, preserving any user-suppliedHF_MODEL_ID.Because TGI's
HF_MODEL_IDdoes not accept an S3 URI (it expects an HF repo id or a local path), the fix mounts the weights at/opt/ml/modelrather than passing the URI through the env var.Genuine local paths, HF-Hub downloads, JumpStart, and all non-TGI servers are unchanged. The change is scoped strictly to the TGI path and does not touch the DJL behavior of #5529 / #5588.
Additional defects fixed (surfaced during real deployment)
HF_HUB_OFFLINEreset: the end-of-build reset set it back to"0", defeating offline loading. It now stays"1"for the S3-mounted path so TGI loads from/opt/ml/modelinstead of phoning home.ModelDataSourceS3Uricould becomes3://.../prefix//when the input prefix already ended in/. WithS3Prefixmatching, no objects match...prefix//, so zero files mount into/opt/ml/modeland TGI fails to find weights. Normalized to exactly one trailing slash.Testing
TestBuildForTGIcases: S3model_pathskips_create_dir_structure; container getsHF_MODEL_ID=/opt/ml/model+HF_HUB_OFFLINE=1;s3_model_data_urlroutes through the S3 branch; user-suppliedHF_MODEL_IDpreserved;HF_HUB_OFFLINEsurvives the post-build reset.test_tgi_server.py: regression test asserting theS3Urihas exactly one trailing slash (no//).End-to-end validation (real SageMaker inference endpoint, TGI DLC, weights from S3)
Beyond unit tests, the fix was validated against a live SageMaker inference endpoint using the TGI Deep Learning Container, loading model weights directly from an S3 prefix (no HuggingFace download). The build step was sanity-checked to confirm the container config before deploying, and the endpoint then mounted the weights from S3 and served successfully:
Observed: no local
s3:/...directory was created; the built container hadHF_MODEL_ID=/opt/ml/model,HF_HUB_OFFLINE=1, and aModelDataSourcepointing at the S3 prefix; the endpoint reachedInServiceand served inference with weights mounted from S3 (CloudWatch shows the container loading from the mounted path rather than downloading from huggingface.co).Future work (out of scope for this PR)
This PR is intentionally scoped to the TGI backend (the subject of #5943). The same "S3 as a weight source" capability should be extended to the other HF-Hub-download backends so behavior is consistent across
ModelBuilder(see https://sagemaker.readthedocs.io/en/stable/ ):HF_MODEL_ID/OPTION_MODEL_IDcan accept an S3 URI directly. Must avoid regressing the DJL env-var handling in bug: ModelBuilder overwrites user-provided HF_MODEL_ID for DJL Serving, preventing S3 model loading #5529 and the DJL passthrough in ModelBuilder fails w/ DJL container using HF_MODEL_ID #5588.ModelDataSourceconstruction (the trailing-slash normalization currently exists per-adapter) to avoid duplicating this logic across backends.Related: #5529, #5588.
Backward compatibility
Additive and TGI-scoped. Every non-S3 / non-TGI code path is reached exactly as before.